Goto

Collaborating Authors

 learning procedure


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

That is, active learning when the labels can be queried from two different sources. One, the Strong Oracle (noted O in the paper), is costly but reliable while the other, the Weak Oracle (noted W) is marginally cheap but also locally unreliable. The idea behind this is to minimize the use of O by relying on the labels from W as much as possible. The authors present a new algorithm to learn in this setting.


Learning to categorize objects using temporal coherence

Neural Information Processing Systems

The invariance of an objects' identity as it transformed over time provides a powerful cue for perceptual learning. We present an un(cid:173) supervised learning procedure which maximizes the mutual infor(cid:173) mation between the representations adopted by a feed-forward net(cid:173) work at consecutive time steps. We demonstrate that the network can learn, entirely unsupervised, to classify an ensemble of several patterns by observing pattern trajectories, even though there are abrupt transitions from one object to another between trajecto(cid:173) ries. The same learning procedure should be widely applicable to a variety of perceptual learning tasks.


A Gradient-Based Boosting Algorithm for Regression Problems

Neural Information Processing Systems

Adaptive boosting methods are simple modular algorithms that operate as follows. Let 9: X -t Y be the function to be learned, where the label set Y is finite, typ(cid:173) ically binary-valued. The algorithm uses a learning procedure, which has access to n training examples, {(Xl, Y1), ..., (xn, Yn)}, drawn randomly from X x Yac(cid:173) cording to distribution D; it outputs a hypothesis I: X -t Y, whose error is the expected value of a loss function on I(x), g(x), where X is chosen according to D. Given f, cl 0 and access to random examples, a strong learning procedure outputs with probability 1 - cl a hypothesis with error at most f, with running time polyno(cid:173) mial in 1/ f, 1/ cl and the number of examples. A weak learning procedure satisfies the same conditions, but where f need only be better than random guessing. Schapire (1990) showed that any weak learning procedure, denoted WeakLeam, can be efficiently transformed ("boosted") into a strong learning procedure. The AdaBoost algorithm achieves this by calling WeakLeam multiple times, in a se(cid:173) quence of T stages, each time presenting it with a different distribution over a fixed training set and finally combining all of the hypotheses. The algorithm maintains a weight w: for each training example i at stage i, and a distribution D t is computed by normalizing these weights.


N-BEATS : Time-Series Forecasting with Neural Basis Expansion

#artificialintelligence

There's one thing that makes Time-Series Forecasting special. It was the only area of Data Science where Deep Learning and Transformers didn't decisively outperform the other models. Let's use the prestigious Makridakis M-competitions as a benchmark -- a series of large-scale challenges that showcase the latest advances in the time-series forecasting area. In the fourth iteration of the competition, known as M4, the winning solution was ES-RNN [2], a hybrid LSTM & Exponential Smoothing model developed by Uber. Interestingly, the 6 (out of 57) pure ML models performed so poorly, they barely surpassed the competition baseline.


One-Step Abductive Multi-Target Learning with Diverse Noisy Samples

Yang, Yongquan

arXiv.org Artificial Intelligence

One-step abductive multi-target learning (OSAMTL) [1] was proposed to alleviate the situation where it is often difficult or even impossible for experts to manually achieve the accurate ground-truth labels, which leads to labels with complex noisy for a specific learning task. With a H. pylori segmentation task of medical histopathology whole slide images [1,2], OSAMTL has been shown to possess significant potentials in handling complex noisy labels, using logical rationality evaluations based on logical assessment formula (LAF) [1]. However, OSAMTL is not suitable for the situation of learning with diverse noisy samples. In this paper, we aim to address this issue. Firstly, we give definition of diverse noisy samples (DNS). Secondly, based on the given definition of DNS, we propose one-step abductive multi-target learning with DNS (OSAMTL-DNS). Finally, we provide analyses of OSAMTL-DNS compared with the original OSAMTL.


Unsupervised meta-learning: learning to learn without supervision

AIHub

The history of machine learning has largely been a story of increasing abstraction. In the dawn of ML, researchers spent considerable effort engineering features. As deep learning gained popularity, researchers then shifted towards tuning the update rules and learning rates for their optimizers. Recent research in meta-learning has climbed one level of abstraction higher: many researchers now spend their days manually constructing task distributions, from which they can automatically learn good optimizers. What might be the next rung on this ladder?


Unsupervised meta-learning: learning to learn without supervision

Robohub

This post is cross-listed on the CMU ML blog. The history of machine learning has largely been a story of increasing abstraction. In the dawn of ML, researchers spent considerable effort engineering features. As deep learning gained popularity, researchers then shifted towards tuning the update rules and learning rates for their optimizers. Recent research in meta-learning has climbed one level of abstraction higher: many researchers now spend their days manually constructing task distributions, from which they can automatically learn good optimizers.


N-BEATS: Neural basis expansion analysis for interpretable time series forecasting

Oreshkin, Boris N., Carpov, Dmitri, Chapados, Nicolas, Bengio, Yoshua

arXiv.org Machine Learning

We focus on solving the univariate times series point forecasting problem using deep learning. We propose a deep neural architecture based on backward and forward residual links and a very deep stack of fully-connected layers. The architecture has a number of desirable properties, being interpretable, applicable without modification to a wide array of target domains, and fast to train. We test the proposed architecture on the well-known M4 competition dataset containing 100k time series from diverse domains. We demonstrate state-of-the-art performance for two configurations of N-BEATS, improving forecast accuracy by 11% over a statistical benchmark and by 3% over last year's winner of the M4 competition, a domain-adjusted hand-crafted hybrid between neural network and statistical time series models. The first configuration of our model does not employ any time-series-specific components and its performance on the M4 dataset strongly suggests that, contrarily to received wisdom, deep learning primitives such as residual blocks are by themselves sufficient to solve a wide range of forecasting problems. Finally, we demonstrate how the proposed architecture can be augmented to provide outputs that are interpretable without loss in accuracy.


An optimal unrestricted learning procedure

Mendelson, Shahar

arXiv.org Machine Learning

We study learning problems in the general setup, for arbitrary classes of functions $F$, distributions $X$ and targets $Y$. Because proper learning procedures, i.e., procedures that are only allowed to select functions in $F$, tend to perform poorly unless the problem satisfies some additional structural property (e.g., that $F$ is convex), we consider unrestricted learning procedures, that is, procedures that are free to choose functions outside the given class $F$. We present a new unrestricted procedure that is optimal in a very strong sense: it attains the best possible accuracy/confidence tradeoff for (almost) any triplet $(F,X,Y)$, including in heavy-tailed problems. Moreover, the tradeoff the procedure attains coincides with what one would expect if $F$ were convex, even when $F$ is not; and when $F$ happens to be convex, the procedure is proper; thus, the unrestricted procedure is actually optimal in both realms, for convex classes as a proper procedure and for arbitrary classes as an unrestricted procedure. The notion of optimality we consider is problem specific: our procedure performs with the best accuracy/confidence tradeoff one can hope to achieve for each individual problem. As such, it is a significantly stronger property than the standard `worst-case' notion, in which one considers optimality as the best uniform estimate that holds for a relatively large family of problems. Thanks to the sharp and problem-specific estimates we obtain, classical, worst-case bounds are immediate outcomes of our main result.


Leveraging Video Descriptions to Learn Video Question Answering

Zeng, Kuo-Hao (Stanford University and National Tsing Hua University) | Chen, Tseng-Hung (National Tsing Hua University) | Chuang, Ching-Yao (National Tsing Hua University) | Liao, Yuan-Hong (National Tsing Hua University) | Niebles, Juan Carlos (Stanford University) | Sun, Min (National Tsing Hua University)

AAAI Conferences

We propose a scalable approach to learn video-based question answering (QA): to answer a free-form natural language question about the contents of a video. Our approach automatically harvests a large number of videos and descriptions freely available online. Then, a large number of candidate QA pairs are automatically generated from descriptions rather than manually annotated. Next, we use these candidate QA pairs to train a number of video-based QA methods extended from MN (Sukhbaatar et al. 2015), VQA (Antol et al. 2015), SA (Yao et al. 2015), and SS (Venugopalan et al. 2015). In order to handle non-perfect candidate QA pairs, we propose a self-paced learning procedure to iteratively identify them and mitigate their effects in training. Finally, we evaluate performance on manually generated video-based QA pairs. The results show that our self-paced learning procedure is effective, and the extended SS model outperforms various baselines.